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GATEWAY DEVICE FOR REMOTE FILE SERVER SERVICES 



Field ef the InvBntlon 

The pnesent invention relates to computer networks, and particulariy, 
5 although not exduslvely, to a method and apparatus for providing remote data 
storage Ibr one or more computers, over a communfcafions network. 



Background to the Invention 

ConverrtionaHy, in a network of computers, for example a corporate 
10 n^work, the primary means of data storage tends to be provided by one or a 
pkjralfty of file server and/or applications server devices in a same geographical 
locatfon. 



A user running a plurality of conventional file servers across a company 
15 network requires management of the server hardware, in addition to the nomial 
user management Conventional file server based local area networks are not 
readSy scateable, without reconfiguration of file servers. For example, users may 
have to be transferred from one file server to anotfier, and the file structures on 
ttre file server need to be managed to ensure a smooth mignatfon of usa-s, as 
20 well as requiring management of different security levels arwd user accesses- 
Maintaining capacity in a file server t)ased local area network of computers can 
become nianagement intensive. 

A potential solution for this problem are the known storage area networks 
25 (SANs), However, these tend to be economically feasible only for very large 
corpor^ns which can afford high end enterprise storage infrastructure. For 
small companies having of the order of 100 or 200 computer users, purchasing 
an extra few terabytes of data storage such companies must either buy a whole 
set of new senders, configure, maintain and manage them, and then manage the 
3 0 users across all the servers. 
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An alternative solutfon to ctata storage for individual (xjmputer users, or 
users of networks of cornputerB is to provide Vt\e user v/iih a network connecGon 
over which they can remotely store Wes, instead of the user bu^g and 
maintaining fteir own file servers- Such a network connection would link to a 
5 remote data storage factBty and may potentially provide a user vwth a much lower 
cost of ownership per gigabyte of fife storage compared vwfli the user buying and 
maintaining their own file servers, A service provider, running the data storage 
fadJity would take on responsibility for data protection, 

10 One problem with providing a remote file server service is the bandwidth of 

the network connection between the user and the service provider. This network 
conneclion r>eeds to be very high performance in order to handle aH the read and 
wr?te traf^ from users to a centraBzed nsmote file server service. This is not only 
eacpensive, but also difficult to deploy. In praclice, there is a limited amount of 

15 data transmission capacity over whfch to pass large amounts of data back and 
forth t^tween a computer and a centralized data storage facUtty, 

A second problem is that a service provider operating a data storage facility 
has no idea how a user wishes to use flie data storage facility at the user's end of 
20 the network connecBon, Data storage is aI^Arays convenfionally used with 
features such as a file stmcture, security, user accesses and the like. There is a 
problem for the servrice prowier in how to accommodate the llexibaity of user's 
own configurations of the data storage space, for a plurality of different users. 

25 Summ^v of the inventton 

Specific implementations of the present invention aim to provide a remote 
data storage service which can use a relatively low data rate networking 
connection, but stHI provide fast read and write access to user files. By tow, it is 
meant low data rate compared with data rates available within prior art local area 
30 network connections, such as Ethernet, as are found in many prior art local area 
networics. There is provided a file sen/er service gateway appliance which 
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interfaoes between a customer and a data storage service prcN^der via a network 
connecfion, for example an integrated services digital network (ISDN) line or a T1 
connecfion, 

5 Using a ^ecific implementation of ttie present invention, there may be 

provkJed a solution that ttie oistomer may request a service provider of ttie data 
repository to make available an extra quantity, e.g. a terabyte or so of data 
storage space in the data repository. Ideally from the customers point of view, 
the amount of data storage expands, without the associated problems of the prior 
10 art network data senders, of moving users between different file servers. This 
makes the cost of usage of bulk data repository facilities attractfve, pnDvided the 
problem of Iknfted data capacity on the communtcatfor^s links can be satisfactorily 
solved. 

15 In speciflc mptementations of the present invention, a network user may 

specify configuration of a remote data block in a data repository, allocating 
different users to have permissions to different files and specifying that the data 
storage space should support ttieir particular operating system, for exanTpte 
Windows NT*, Unix* or the like, frwn the dient network. Effectively, management 

20 of a date block, once allocated to a customer, is performed by the custon^r 
themsetves. The large vohime of data storage in the date repository is divided 
Into a frfurality of btocks, allocated to different customers, and each customer 
manages the file storage within their own data block themselves. 

25 The problem of resbicted data capacity between the data repository and the 

gateway appliance is overcome by local caching of data at the gateway appliance 
prior to sending compressed data transmission files comprising user data and a 
file header over the communications link. Data is stored In the data repository in 
compressed fbrmat Transmission of data files is made at user definable periodic 

30 intervals, and local caching of user data enables recentiy writi:en user data fries to 
be recovered Virithout needing to retrieve data from the data repository over the 
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communications link. Further, incremontal changes to written data files which are 
stored in the local gateway appliance cache are periodically collected together 
and sent to the data reposMory where they are stored as Incremental date fSes. 
wfthout merging them at the data reposrtory, wfth the original data fHes. 

According to a first aspect of the present invention, there is provided a 
method of storing user data of a plurality of network computer entities, said 
method characterized by corr^rising the steps of. 

writing said user data to a local data storage area (1001) in a said computer 
entt^; 

creaiing an emulation data which emulates a file systCTi type in use in said 
network; 

incorporating said user data and said file system type data in a data fife for 
transmfesion; and 

transmitting said ^nsmission file over a communicatkDns link for remote 
data storage. 

According to second aspect of the present invention there is provided a 
method of preparing data originating from a plurality of networiced computer 
entities into a Ibrmat for remote storage, said method comprising the steps of: 

assembling a file of user data to be remotely stored; 

assembling a header data (1102), said header data comprising: 

an address data (401) identifying an address of a device from which said 
data is sent; 
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a file system type data (400) identifying a file system type which is used by 
the device from which the data is sent; 

sn access control data (404) describing at least one category of user who is 
authorised to access said user data flies; 



a 



timing data (405) identifying a time associated with said user data file; and 



appending said header data (1103) to said user data file 1o create a 
transmission file comprising said user data file and said header data- 
According to a third aspect of the present fnvenlion there is provided a 
gateway appliance for sending data to and receiving data from a remote data 
Storage locatfon accessible over a communications Jinl^ said gateway appliance 
comprising: 

a data processor (1002); 

a first Of communications port (1004) for communicating with a plurality of 
computers in a computer network; 

a second communications (1005) port for communicating with a remote 
data storage ^ctlity; 

a norvvolaUle data storage device (1001) for storing locally, data to be 
communicated via said second port; 

means (1001) for emulating a file system corresponding to a file system of a 
networi< of computer entities; 
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means for converting data between a file system dependent fonnat and a 
file system independent fonnal; and 

means for converting said data between a compressed fbmiat and an 
5 unconipressed fbmiat. 

According to a fourth aspect of the present invention there is provided a 
bulk dafa stx>rage facility comprising: 

10 a plurality of data storage de>4ces (500, 601 ); 

a pkjraitty of file sen/ers (501, 602) configured for storing data in said 
piuraltty of data storage devices; 

15 a plurality of gateway devices (502, 603) providing external connectivity to 

said plurafity of fife senders and adapted to receive packets of incoming data; 

said bulk data storage fadlity characterized fay comprising: 

20 means (604) to allocate said plurality of incoming data packets to data 

storage space in said plurality of data storage devices; and 

database means (1301) for recording a data location of each said plurality 
of data packets in saki plurality of data storage devices. 

25 

According to a fifth aspect of the present invention there is provided a 
method of providing data storage to a plurality of customers at a bulk data 
storage repo^ory, sakJ method comprising the steps of. 

3 0 receiving packets of data from each of said plurality of customers; 
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aflocating (800) 1o each said customer at least one Uock of data storage 
space; 

dllocating to each said received packet a file location in safd data storage 
space; 

5 

allocating to each said packet a flio name; 

storing (802. 1407) said file name In a database, saW database Identifying 
said file location m said data rcposftoiy associated wfth said data packet 

10 

Bfi^ Dcscriptton of the Drawings 

For a bofier understanding of the invention and to show how the same may 
be canrted into efect, there will now be described by way of example only, 
^ectfic embodiments, methods and processes according to the present 
IS invention with reference to the accompanying drawings in which: 

Fig. 1 illustrates schematically a bulk data storage repository fadlHy located 
geographically remotely finom a plurality of corporate user networks, and 
connected to the corporate user networks over the internet 

20 

Rg. 2 illustrates schamaticaHy a relatJonship between a Inilk data storage 
repository and a single gateway appliance comprising a corporate user network, 
the gateway appliance connected to the data repository via a communfcations 
link, e.g. the internet; 

25 

Fig, 3 illusfrates schematically a data transmission file for transmitting data 
between a customer gateway appliance and the data repository of Fig. 2 over a 
communications link; 



30 Fig. 4 illustrates schematically data types comprising a meta data header 

field of the data transmission fife of Rg. 3; 
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Rg. 5 iHustraftes schematically a prior art server cluster having a bulk data 
storage device, having high reliabllily, high redundancy and scalabflity; 

5 F^. 6 illustrates schemaficalty a data repository accord'fng to a specific 

Implementation of the present invention comprising a prior art bulk data storage 
device, controlled by a novel operating system; 

Fig. 7 illustrates schematically an internal file structure of a data storage 
10 facility of Fig. 6 herein; 

Fig. 8 Illustrates schematically an overview of a first mode of operation of 
the data nepostoiy of Ftg. 6 method for aflocating data storage space to a 
particular gateway appliance of a customer; 

15 

Fig, 9 illustrates schematically a second mode of operation of the data 
repository of Fig. 6 herein, for receiving a data transmission block from a 
customer gateway appliance and storing data in a bulk data storage device; 

2 0 Rg. 10 BlustfBtes schematically a gateway appliance accofding to a specific 

impleirontatton oflhe present invention, for linking a customer computer networic 
te the data repository fadiity lUustrated in Fig. 6; 

Fig, 1 1 illustrates schematically an overview of a first method of operatksn of 
25 the gateway appliance of Fig. 10, for sending data to be stored in the data 
repository of Fig. 6 herein; 

Fig. 12 Illustrates schematically a data file containing cxDnfiguration data of 
the gateway appliance of Fig. 10 herein, which may be stored as a data file in the 
30 data repository of Bg. 6 herein; 
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Fig, 13 illustrates schematically architecture of management module 406 of 
the data repository and 

Fig. 14 Illustrates schematically a third mode of operation of the data 
5 repository, upon receiving a data file from a gateway appliance. 

Deteited Description of the Best Mode for Carrying Out the Invontfcwi 

There will now be described by way of example the tjest mode 
contemplated by the inventors for carrying out the invention. In the following 

10 description numerous specific details are set forth in order to provide a thorough 
under^nding of the present invention. It will t^e apparent however, to one 
sWffed in the art, that Uhe present Invention may be practiced without Hmitetlon to 
these specific defefls. In ottier instaitces, well known methods and stmcfajres 
have not tjeen described In detail so as not to unnecessarily obscure the present 

15 invention. 

Refening to Fig. 1 herein, there is illustrated schematically a computing 
system comprising a plurality of user networics 100, 106 comprising a plurality of 
individual computing entities 101-103 connected together by a tocal area 

20 networit, and comprising a gateway device 104 for communicating over a 
cofTMTiunications link, for example the internet 105, witii a bulk data stor^ 
apparatus 106 which may t>e located at a data repository facility 107 located 
remotely from the user network 100, The bulk date storage unit may store data 
from a plurality of corporate networks 100, 106, and serves a function of a 

25 centralized data storage facility for storage of corporate data, as a replacement 
for individual corporations purchasing their own data storage devices. 

The data repository 107 may be located at any iocation in the world, and 
connected to the plurality of corporate networks 100. 106 via dedicated 
30 communications lines, for example virtual private networks (VPNs), or via the 
internet Practically, the communications link connection between a corporate 
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netvvork and the data rsposrtory will not be of unKmfted data capacity, but wfll 
have capadty Bit^ imposed upon it, either In terms of technical bJt rate Rmftation, 
«■ in tjertns of financial Kmftations on the purdiase of bit rate and data capacity. It 
is therefore important to efficientty utHize the available bit rate capacity of the 
5 communications link between a gateway device 104 and the bulk dstta repository. 

The data repository 107 comprises a large array of date storage devices, 
wrtti associated processor capacity^ providing a bulk data storage fecBSy to a 
plurgrfity of different computer networks, each of which may be run by a different 

10 corporafion. The service f^vider owning arKJ maintainir^g the data repository 
105 provides as a paid Ibr service, provision of data storage to each of the 
persons man^fing the corporate computer nefeworks 100, 106, vwth an advantage 
that increasing or decreasing the anK>unt of data storage suRSfied to a 
corporation can be quickly hr^lemented in r^ponse to a customer requesting a 

1 5 greater or lesser amount of data storage. 

A main reason for providing a data repository service is cost of ownership 
compared to individual networked file servers. Further, high reliabiltty, high 
r^undancy and h^h avaflabBity are also advantages over conventional ffle 
20 ^rvers provided on local area networks. To obtain tfie same reliabiiity and 
redundancy in a conventional local area network structure would incur higher 
costs to a user. 

At each user network, there may be tens or hundreds of individual persons 
23 usir)g the network, any of whom wish to access the data in the bulk data storage 
repostory 107* A single bulk data storage repository 107 may serve hundreds or 
thousands of Individual user networks. For handling multiple users having 
multiple connections over nuiitiple communication links, e,g- over the Internet 
105, if users were to configure the bulk data storage space 107 individually to suit 
30 their own data security policies, and operating environments, by sending 
configuration messages over the internet, then at the repository end, there would 
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be a huge management problem In nranaging the Incoming management traffic 
at the data reposrtory. Auttiorisatton for dividing the data block, e-g. NT 
authorizations, being transported across the internet should be avoided. 

5 Refening to Fig. 2 herein, there is fflustrated schematically a connection 

between a gateway appliance 200 and a data repository fadllty 201 over Internet 
202. Gateway appliance 200 serves a corporate computer network comprising a 
plurality of individual computer entities 203'-206 which are connected via a local 
area network 207, 

10 

The purpose of the gateway appliance Includes: 

• Pn3vkJing a user with an emulaton of a I8e server which integrates easily 
Into a cummer's existing network, for example to emulate an NT sen/er 

15 for NT domains, a n^ork sen/er for NDS networks, an NFS server for 

Unix networks and the like. 

• To provide perfomnance enhancements so that read and write traffic 
over a low speed networic connection to the service provider is reduced 
to an absolute minimum without inripacting a user's read/write 

2 0 performance to the emulated file server. 

Gateway appliance 200 provides an abstraction of a data storage facility 
available to the user such that users can configure their own storage 
management schemes from their own user networics. All of the complexity of 
25 Individual user authorizations, including the details of which Individuate can 
access which flies, is dealt with by the gateway appliance 200. The data storage 
repository 201 serves requests for raw blocks of data storage capacity in 
response to requests from the gateway appliance. 

3 0 Emulation of a local file system resident on a computer networic Is achieved 

by the gateway appliance providing emulations of the various file server file 
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system types over local area network interfaces in the gate\A^y appliance and 
also by supporting fntegratton Into Ihe various leading network security models, 
for exanf^pte NDS, ^f^ Domain, Active Diredory, These emulated file systems are 
mapped to generic 'raw* file systems at ttie data repository, so that when a user 
5 wrtes a rrew file to an emulated fMe system, this is stored in the 'rsw* file systan 
at the repository along v\rtth the spedfic attributes to the ffle system. Each user in 
a computer network who is allowed access to the gateway appliance may he 
asagrred a private internal security identification for the 'raw* fde system, and the 
gateway appliance converts between the local area network security user 
10 identifk:ations, and the internal identifications used in the Yaw* file system at the 
data repository. 

Pn^vfdfng such an emulation scheme allows a user to char^ the emulated 
file systems to any size they wish. For example, if a user fe running out of space, 

15 then a user can purctiase additional file server capacity from the data repository 
service provider, and allocate this additfenal "raw* capacfty to existing emulated 
file systenfis, or create new file systems. This means there are no significant 
restraints on how much Yaw" capacity the user can use at the data repository, 
though if the user had a large amount of capacity, they may wish to add 

20 additional local area network interfaces to the gateway appliance to share the 
local area network traffia 

The gateway appliance uses a local data storage device as an advanced 
read and write cache to reduce the amount of network traffic between the 

25 appliance and the data reposrtory. When a user writes a file to the emulated fHe 
system in the gateway appliance, this is initiaHy cached on the appliance data 
storage device. At regular intervals^ which are pre-settab(e by a user, for 
example hourly, any files changed since a last transmission to the data reposrtory 
are sent back to the data repository to be stored in the raw filing system. It 

30 means such a redundant file elimination, software compression and delta 
bloddng may be used at the gateway appliance to reduce the amount of traffic 
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traversing the oommunications link to a rmmmum- In the data repository, new 
(lata is received, decompm^ed, and deltas are applied to files to bring them up 
to date with a user's latest file changes. If a user has made multiple chaiges to a 
flte wShin a slngte transmission interval, then these changes may be consolidated 
5 bofors b^ng re-stored 5n the data repository. 

The gateway appliance may cache recently written files which are kept in 
ttie local data storage device at the gateway appliance after file tnansmissfon. 
Thus, if a user reads the file again, they may read It from the gateway appliance 

10 directly, rather than having recourse to access the data repository over the 
communlcalions link. This means for many file reader accesses, the user wil get 
fuB performance (Bmited by ihe perfomtanc^ of ttie gateway appliance) rather 
than incumng the ctelay in obtaining tfles from the remote data repo^ory. Further, 
the feet ti^t a file is cached locaBy at the gateway appiiarKje means that a user at 

15 a comfHJter entity does not need to continually access the data repo^ry to 
receive files, whfch again minimizes use of bit rate capacity over the 
communfcalions link. For file read accesses that are not cached on the gateway 
appliance, the appliance may request that file from the data repository in 
compressed format, and read tt back (still compressed) over a network 

2 0 connectfon from the data repository. As the file anives at the gateway appliance, 

the gateway appliance decompresses the file and makes it available for use on 
the computer network. Given that no wrfte traffic need be IncuriBd, except at 
transmissfon times tjetween the data repository and the gateway appliance, then 
a oonnecSon may have fljil bandwidth available for the majority of non-cached fite 
25 reads. With an ISDN network connection at 128 Kbits/sec and 2:1 compression, 
the user can read back a non-cached 1 Mbyte fite in approximately 40 seconds. 

Configuration data of the gateway appliance is stored at the date repository 
201, so ttiat in the event of catestrophic failure of a gateway device, a new 

3 0 gateway device can be reinstelled, and reconfigured according to the 
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configuration date retrieved from the data repository 201. The oonfiguration data 
includes customer-specific settings of a gateway appliance 200. 

Sending only blocks of data which have changed since a last transmission 
between the gateway appKance and the data repository drastically reduces an 
amount of data which has to be transfened over the communications link 
between the data repository and gateway appliance. This enables the gateway 
appliance to provide a file emulation servfce to the plurality of networked 
computers, using a relatively low bit rate capacity communications link. 

Blocks of data from a cached file stored at the gateway appliance whfch are 
transmitted over the communications link, are compressed prior to transmission. 
In order to cany out the compression prior to transmission, the gateway 
appliance must catalog changes in a me. and record how a file has changed, 
after a previous transmisston event, in order that only the changed portions of the 
fite are compressed and transmitted over communications link. 

As an alternative to decompressing received partial files representing 
updates to user fUes, decompressing the original user file at the date repository, 
merging the files to obtain a new updated file and then recompressing the new 
updated file, the data repository may simply treat the incoming packages as 
being packages to be simply filed away without any merging or processing. In 
this case, on retrieval, the date repositocy may represent a compressed 
encrypted package representing an original user fBe, plus encrypted compressed 
update packages to that user file, upon demand from the gateway appliance. 
The gateway appliance may then have the job of processing by decompressing 
and decrypting the original user date file, and then incorporating all the updates 
received from the date repository, after decompression and decryption of those 
updates, to reconstitute the actual up-to-date user date file. 
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Received dsta packages stored at the data repository represenfing 
upgrades to user data files may be purged after a pnedetermir^ number of such 
ffles are recerved. Purging may be by combining the earliest versions of upgrade 
files. For exanple. when a prBdetermtned number, e,g. 30 upgrade files are 
5 received. In order to avoid storing more than a preset number of upgrading files, 
the eariiest upgrade file versions may be merged together. Such technology is 
already applied in conventional back up systems, for example Hewlett Packard 
Auto Backup systems, and may be applied in the data repository. 

10 Refem'ng to Fig. 3 herein, there is illustrated schematically an example of a 

data packet compiled by gateway device 200, for sending over the intemet as 
pkjralfty of TCP/IP packete, for receipt by the data repository 201, The data 
packet corrprises a raw user data tile 300, whk;h contains Vne actual data to be 
stored; and a meta data header 301. Meta data header 301 contakis OTough 

15 information for the gateway applianco 200 to identify the raw data so that the 
gateway appliance, in conjunction with the data repository, can search for 
individual data blocks which have been stored in the data repository. 

The meta data 301 Is specific to a particular type of operating system of a 
20 user The number and content of the data fiekls In the meta data are created 
specific to each different operating system supported by the data repository 201 - 

Refening to Fig- 4 herein, there is illustrated schemattoally individual data 
fiekls wiSiin meta data header 301 . Individual data fiekls include a file type data 

25 a^d 400 identifying a file system type, for e?^mple whether the network fiHng 
system is an NT-type file system, a NetWare-type file system, a Unix-type file 
system or the like; a tong name of the file 401; a short name of the file 402; 
securfty attributes of the file, which allow users access or deny access to 
particular users of the file such as; an access control list 404 for controlling 

3 0 access to the files, e.g. whether the file is allowed to be read or written or deleted; 
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and a date and Sme stamp 405 marking the date and lime when the file was 
created, and/or the date and time a file was modified. 



The meta data header is a superset of ail the possible fiie attributes v\^ich 
5 wodd be available In all the supported fHe system types in the gateway. For 
example supposing the gateway gqpph'ance supporte just Windows NT and 
NetWare file systems, then the meta data produced by that gateway appliance 
would be a superset of the attributes from both those file systems. 



10 The fiJe names are preferably based on the file system of the nebvoric which 

the fife originates. For example, if the file system used in the repository is Unix, 
but the file system used on the computer networic is DOS, DOS fHe names can 
onJy be 8 charactefs* with 3 charaders ftsr the extension, vi^ieneas Unix tile 
names are effectively unlinated. For a tran^issfon file sent from a DOS based 

15 computer networit, fhe meta data would have a DOS name* As another example, 
supposkig the user's computer network operates a Windows NT^file system, the 
gateway appliance emulates a Windows NT file system, therefore the naming 
system is based on Windows NT. If the data reposftory cannot store data files in 
that fomfiat, then the inlbrmafion that the fHe should be seen as a Windows NT file 

20 Is stored in the meta data header. 



The actual name of the transmission file contained in the meta data can 
also impart infbmiation to the data repository. For example, the file names can 
be used to seart* data blocks within the data repo^ory to find files which are 
2 5 controlled by a partlcuJar gateway appliance. 

Referring to Fig. 5 herein, tiiere is illustrated schematically a prior art data 
storage fadlity vAxidh may be incorporated into data repository 201. The prior art 
data storage device comprises a high capacity, high reliability bulk data storage 
30 unit 500. which may comprise an array of rotating hard disk drives; a plurality of 
file servers 501 for managing file handling and configuration of the data storage 
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unrt 500; each We server 501 having a gatev^^y port 502 for connecting to a 
oommurricE^ons link, for example an internet connec^on. The bulk data storage 
unit 500 may be based upon a known storage area network (SAN) which 
comprises a pfuraKty of da^ storage devices and a fiber channel network. Hie 
5 SAN may be eaaly sc^ed up by addffig more data storage components to the 
fiber channel rwtwork. However, in the general case, the date storage device 
500 could be any of distributed networked storage, having the 

charatAeristics of high reliability, high data storage capadty and having facBity for 
scalability so that ttre data storage capacity can t>e expanded easily by addition of 

10 indivklual data storage disk drives, without signifteant loss of perl^^ It will 
be appreciated by those skilled in the art that technologies such as storage area 
nBVmrks, and file server dusters, am known in high-end Unix ^rstems utiKzed in 
lage corporate networks. Such systems are avaBable from Hewtett Packard 
Company. The dala storage unit 500, file servers 501 , and gateway devices 502 

IS are interconnected, to provide a high capacity, high reliability data storage 
repository, Internet connectfons provided through gateway devices 502 may be 
added in a scaleable manner, depending upon how many customers are to be 
connected to the cluster. Entry Into the duster by any one of the Internet 
connections at any gateway allows access to any of the individual file servers 501 

2 0 within the duster- 
Referring to Fig, 6 herein, there Is illustrated schematically an architecture of 
a data repository fadlity device 201 according to a specific embodiment of the 
present invention. The data repository fadlity comprises a bulk data storage unit 

25 601 as herein befiare descnlied, comprising a plurality of file servers 602 and a 
plurality of gateway ports 603. which may be configured in a known layout as 
shown in Fig. 5, The data repository also comprises an operating system 604 
comprising a directory structure control module 605 for controlling a structure of 
file directories within the data storage 601; a management module 406 for 

30 managing overall control of the data repository, and a delta block merging 
module 607, 
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The operating system 604 in the data repository has to perform main 
functions as foHows: 

5 • When the operating system receives a data transmission fHe fttim a 

gateway appliance, the operating system names the file and stores ft in 
a specific directory in the data storage unit, so that the received data 
transmission file is assodated with a particular gateway appliance from 
which it originated. 

10 • The repository adds its own atWbutes to the received date transmissfo 

file. These are part of the repository file s>^tem and are not necessarily 
an fntegral part of the data teansmfesion fHe. 
• The data repository must be able to maintain security syst^is for fite 
access according to a user's security pofides on their network. 

15 • In terms of the data repository file system the raw data fe stored in bulk 

data btocfcs, assigned to a customer's gateway appliance, and the meta 
data is held in a file system as part of the repository file system structure* 
For example there is a directory listing of which files are In data 
repository, what directories they are in, which physical Wocks on disk the 

20 raw data fBes are located at. 

In the data repository, individual blocks of data can be configured to tje 
viewed by a user as betonging to any particular type of operating system, for 
exan^ a first bJodc of data may be configured to be viewed as an NT ffle 

2 5 system, a second block of data may be viewed by a user to be a NetWare* filing 

system. From the user's point of view, the data blocks are expandable in terms 
of memory size, whilst keepirtg the same file structure. 

From the point of view of the service provider mnning and managing the 

3 0 data repository, the service provider does not want to be involved directly in how 

the data storage is used by the plurality of users, and in particular the service 
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(»Dvider does not want the system overhead of dociding which file sy^sm types 
and stees a user of the data neposrtory requines, ^id does not want to tea>me 
involved in detemifning what authorizations different indtviduats vt^thin a 
cofporatfon have in using a block of data storage aHocated to a corporate user, or 
5 beconie invot^ed in the details of infomiafion security pofides of individuai 
corpofate users. The data reposftory may bo handling up to Petetjytes of data, 
therefore any management of the data storage space by the service provider is 
likely to give the service provider higher administration costs. 

10 To address the problem of management of data within the data reposttofy, 

in the best mode according to the present invention, configuration of data storage 
space is, as far as posstole, put wdi&r control of users of the cKent computer 
nebfforics by ^Hrtue of lie bancffirg by the customer's gateway appBance, vrith, as 
far as possible, mariagement of data storage space at the data repository being 

15 Hmfted to serving out bloc*s of data storage. The repository needs to be able to 
hanctte aMocation of data storage space to individual users, and storage of data 
blocks in that space, whereas the gateway appliance needs to be ^e to present 
the remote data storage facility to users in a file structure compliant with the file 
system of the operafing system on the local area network. Because of the 

20 limitations of the communications link, trar^fer of data over the communications 
link requires compression of data. This is done at the level of indMdual blocks of 
data. 

Data martagement module 608 mcKiitors how much data stoiBge space 
25 each individual customer is using, and can calculate Invoices according to how 
much data storage space is being used. 

Refemng to Fig. 7 herein, there is illustrated schematically a He structure 
applied within data repository 201. Each gateway appliance 200 of each user Is 
30 allocated a data fatock 700, 701 reserved Ibr exclusive use of that conespondi'ng 
respective gateway appliance. Within the data block 700, Individual received 
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data transmission packets are stored in locations which are allocated by 
management module 606- The locations may be allocated sequentfally, 
d^ending upon a date and tJmestamp of the data pacfc^ received from the 
gateway appSanoe. DirecteKy stmctune contn>l module 605 maintains a database 
5 If^'ng of: 



• Locations of data blocks assigned to each of a plurality of gateway 
appliances 

• Within those data blocks, location of individual data packets received 
1 0 from that gateway appliance 

Data padcete are stored and retrieved from the data storage area by 
management module 606, which is able to tocate those data packets by 
reference to fte internal location database stored in the directory stoicture control 
15 module 605. 



One reason for grouping the files in the manner shown in Fig. 7 is so that a 
service provider can see how much data storage space a particular customer is 
using. 

20 

Refening to Fig, 8 herein, there is illustrated schematicaUy a method for set 
up of a r>ew dsia block 700 for a new gateway appliance. In step 800, a human 
operator accessing management module 606 via a user interface compristng a 
visual display, keyboard and poinGng device, for example a nwuse, creates a 

25 new data btock 700, from a dropdown menu presented on screen, and generated 
by mana^ment module 606, In step 801, management module 606 enters a 
gateway appliance kierrtrfier data, Wentifying the customer's gateway appliance, 
into the database. In sftep 802, within the database, a plurality of indivWual file 
locations are altocated, corresponding to a plurality of individual file locations in 

3 0 the data storage block 700, 



30dd0147 

-21- 

If a customer requtras more data storage, then using the management 
module 606, a humm operator at the data reposSory 600 can simply create more 
dat^se entries oon^^x^nding to more file locations in the bulk data storage 
l>lock, thereby InCTeasing the size of the data block avaBaWe to the customer. 

5 

Refefring to Fig, 9 herein, there is illustrated schematically handling of a 
data transmission block by the operating system 604 of the data repository. In 
step 900, the repository receives a data transmission blod< from any one of the 
pluralrfy of gateway appliances which the repository sen^- In step 901 , the 

10 management module 606 reads the meta data header on the received date 
tr^smJssfon blodc, and in step 902, reads frie fHe type data, fife name data, 
daie/Sme stamp d^ of the meta header, and passes this to the d«^cJocy 
stmcftife control module 606. In step 903, tte drectory stmc&jre cor*ol module 
405 stones file location data and fime stamp data In a database location 

15 corresponding to the indivfclual customer from whidi the data transmiss'ton file 
has been received- In step 904, there is allocated a data storage kx:atfon In ttie 
repostory data storage area to the transrrtssSon fBe received from the customer 
In step 905, the received data transmission file Is stored In a data location 
allocated to the customer, according to the ffle stoictur© as illustrated with 

2 0 reference to Fig. 7 herein. 

Referring to Fig. 10 herein, there is illustrated schematicafly an architerture 
of a gateway appliance 200. Gateway appliance 200 comprises a hardware 
ptertfomn 1000 and an <^rating system 1001. Hardware pJaJfomi 1000 

25 comprises an amount of local data storage in the form of one or a plur^fty of hard 
disk drives 1001 : a processor 1002, an associated random access memory 1003; 
a tocal area network port 1004; and a communications Irnk port 1005» for 
connecting, for example, with the internet. The operating system. In addition to a 
conventional operating system such as Unix, Windows of the like, conprises a 

30 gateway application 1006 comprising a manageability control module 1007; a 
performance caching module 1008; and a bandwidth control module 1009. 
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The gateway application 1006 operates to emulate a file system 
coiresponding to a file system of a network of computer entKies to which ttie 
gateway appBance is conne<^; cache date fifes from the network, prior to 
5 sending data ffies to the date repository, so that often used ffles can be heW 
focatfy at the gateway appliance between date storage operations; apply 
conversion of user date files from ffle ^^em dependent format to file system 
independent fomiat of date, so that fife in dependent format date is sent to the 
date repository, whilst file type dependent date is communicated to the n^work 
10 computer entftles; and compress/decompress date prior to and after transmission 
over file communicafions link, 

Refem'ng to Fig. 11 herein, th^ is iUusb^^ed schemafic^ a first meil^ of 
oper^ton of gateway appliance 200, In step 11 00, a user stores a fife at a tocal 

15 cifent romputer within the user n^ofk, in accordance with the operafing system 
of tfiat network. Date is received from the nefworit client computer entity by the 
gateway apfrfiance in step 1100 over the tocal area network. In step 1101, the 
gateway applSance Interrogates the operating system for the fite name, file type, 
and secunly date r^ating to tiie fife, and g^ierates fife name date, file system 

20 type and file type date and security date. In step 1102, the gateway appfiance 
compiles a mete date header, fffling In the individual date fieWs for file system and 
fite type, lor^ name of file, short name of file, securtfy attributes of the fite, and 
access control to the file, and Bffpiies a date and time stemp to the file. In step 
1103, the gateway appliance appends the mete date header to the raw date fHe 

25 to create a date transmissfon fite as illustrated in Fig. 4 herein. In step 11 04, the 
date transmission fite is passed down to a transport layer within the gateway 
appliance, and may be sent over the internet connection either as a TCP/IP 
packet stream, or a series of ATM ceHs as is known in the art. In step 1005, the 
transmission fite is sent over the network connecfion in the selected protocol, e.g. 

30 TCP/IP. or ATM. 
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Refening to Fig, 12 herein, there is iHustrated schematicaHy tlie file type 
data 400 corrtemed in tfie meta data header 301, The file type data comprfees a 
name and address field 1200 corrtaining a logical adABSS of the gateway 
appliance or^ating the data transmisston block; a network settings field 1201, 
5 which ^res aH the settings of the user's network, for exanple security 
authorizations, ass^ment of printers to individual computer entities connected to 
internet services and the like; and an emulation file system configuration field 

1202 centring data describing how the gateway appliance is configured to 
emulate a particular file system configuration, for example a Windows NT-based 

10 file system, or a Unix-based file system; and a cyclical redundancy code d^eck 

1203 for recovering any of the name and address fiekJ» network seiBf^ field or 
emuls^ion field data in the event of data corruptfon of the file either during 
transmission, or as a result of storage in the data repository, 

15 Refen^g to Fig. 13 herein, data management module 606 comprises a 

policy data table 1300. which stores policy data for each of a plurality of 
customers. Such pdfcy data may include for example a maximum amount of 
data storage space which a customer has contracted to use In the data 
repository. Data allocation module 1301, allocates data storage to indMdual 

zo customers, as data packets are received from those customers. Monitoring 
module 1302 monftore the altocation of data storage space in the repository to 
Indivkiual customers. If a customer attempts to exceed their data storage 
allocation by sending data storage packets v^ich would cause overftow of their 
aHocated data storage space, flie data storage monitoring module 1302, having 

25 knowledge of the maximum capacity allocated to that customer by reading polkqr 
data 1300 may generate a 'refuse storage' message which refuses stor^ of the 
next incoming data packet from a customer where this wouW cause overftow of 
that customer's allocated data storage block. 

30 Billing module 1303 may cateulate an invoice amount for which a customer 

is to be invoiced, which depends upon the amount of data storage space that 
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customer has used, and Ihe time period over which that data storage space has 
been used. Bearing in rrand tfiat fiJes may be stored or retrieved at any tkne, a 
untt of calculation upon whidi a rrx^netary value of invoicing is calculated may be 
gigabyte minutes, that is to say storing 1 gigabyte of customer data for 1 minute 
5 incurs a monetary chafge- 

Refening to Fig. 14, there is fliustrated schennaticaify operation of the 
operating system 604 of the data repository for managing data storage capacity 
of a customer A. In step 1400, on receiving a data pa(*et from customer A, 

10 policy database 1300 is read to find out what polfdes are applied to a data 
storage blo<* corresponding to customer A. In step 1401, the capadly of data 
aJready occuf^'ed in the data btock of customer A by data pad^ets received from 
oistomer A is read. In step 1402. the d^ packet, wWch is stored in a buffer ^ it 
is received, is read, and If the additfon of the data packet to tfie e>3sting data m 

15 customer A's data block will exceed the allowed size of customer A's data block, 
then in step 1403 it is checked from the policy database 1300 whether a reserve 
data storage fecillty is available for customer A, If a reserve data storage facility 
is not available, then in step 1404, the repository refuses to store the incomir^ 
data packet and sends a message to the gateway appliance of customer A 

20 informing that storage of the padcet would exceed the agreed data storage 
amount If customer A does have a reserve facility, then in step 1405 the size of 
the data btock allocated to customer A is increased, and in step 1406 a m^sage 
is sent to the gateway appliance of customer A, that flie reserve data storage 
fadiity being used. In step 1407, ihe data packet is stored in Qie now erriarged 

25 data Wock aUocated to customer A. However, if in step 1402, storage of the 
incoming data packet would not exceed the available tree space vwthin the 
reserve data block for customer A, then the data packet is stored in tiiat data 
blod^ as herein described. 

30 



